Summarizing Definition from Wikipedia

نویسندگان

  • Shiren Ye
  • Tat-Seng Chua
  • Jie Lu
چکیده

Wikipedia provides a wealth of knowledge, where the first sentence, infobox (and relevant sentences), and even the entire document of a wiki article could be considered as diverse versions of summaries (definitions) of the target topic. We explore how to generate a series of summaries with various lengths based on them. To obtain more reliable associations between sentences, we introduce wiki concepts according to the internal links in Wikipedia. In addition, we develop an extended document concept lattice model to combine wiki concepts and non-textual features such as the outline and infobox. The model can concatenate representative sentences from non-overlapping salient local topics for summary generation. We test our model based on our annotated wiki articles which topics come from TREC-QA 2004-2006 evaluations. The results show that the model is effective in summarization and definition QA.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extending DBpedia with List Structures in Wikipedia Articles

Ontologies are the basis of the Semantic Web. Owing to the cost of their construction and maintenance, however, there is much interest in automating their construction. Wikipedia is considered a promising source of knowledge because of its own characteristics. DBpedia extracts a large amount of ontological information from Wikipedia. However, DBpedia focuses exclusively on infoboxes (i.e., tabl...

متن کامل

"The sum of all human knowledge": A systematic review of scholarly research on the content of Wikipedia

Wikipedia might possibly be the best-developed attempt thus far of the enduring quest to gather all human knowledge in one place. Its accomplishments in this regard have made it an irresistible point of inquiry for researchers from various fields of knowledge. A decade of research has thrown light on many aspects of the Wikipedia community, its processes, and content. However, due to the variet...

متن کامل

Generating Wikipedia by Summarizing Long Sequences

We show that generating English Wikipedia articles can be approached as a multidocument summarization of source documents. We use extractive summarization to coarsely identify salient information and a neural abstractive model to generate the article. For the abstractive model, we introduce a decoder-only architecture that can scalably attend to very long sequences, much longer than typical enc...

متن کامل

WikiTopics: What is Popular on Wikipedia and Why

We establish a novel task in the spirit of news summarization and topic detection and tracking (TDT): daily determination of the topics newly popular with Wikipedia readers. Central to this effort is a new public dataset consisting of the hourly page view statistics of all Wikipedia articles over the last three years. We give baseline results for the tasks of: discovering individual pages of in...

متن کامل

Advertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles

When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009